Genome-Wide Identification of Human Functional DNA Using a Neutral Indel Model
نویسندگان
چکیده
It has become clear that a large proportion of functional DNA in the human genome does not code for protein. Identification of this non-coding functional sequence using comparative approaches is proving difficult and has previously been thought to require deep sequencing of multiple vertebrates. Here we introduce a new model and comparative method that, instead of nucleotide substitutions, uses the evolutionary imprint of insertions and deletions (indels) to infer the past consequences of selection. The model predicts the distribution of indels under neutrality, and shows an excellent fit to human-mouse ancestral repeat data. Across the genome, many unusually long ungapped regions are detected that are unaccounted for by the neutral model, and which we predict to be highly enriched in functional DNA that has been subject to purifying selection with respect to indels. We use the model to determine the proportion under indel-purifying selection to be between 2.56% and 3.25% of human euchromatin. Since annotated protein-coding genes comprise only 1.2% of euchromatin, these results lend further weight to the proposition that more than half the functional complement of the human genome is non-protein-coding. The method is surprisingly powerful at identifying selected sequence using only two or three mammalian genomes. Applying the method to the human, mouse, and dog genomes, we identify 90 Mb of human sequence under indel-purifying selection, at a predicted 10% false-discovery rate and 75% sensitivity. As expected, most of the identified sequence represents unannotated material, while the recovered proportions of known protein-coding and microRNA genes closely match the predicted sensitivity of the method. The method's high sensitivity to functional sequence such as microRNAs suggest that as yet unannotated microRNA genes are enriched among the sequences identified. Furthermore, its independence of substitutions allowed us to identify sequence that has been subject to heterogeneous selection, that is, sequence subject to both positive selection with respect to substitutions and purifying selection with respect to indels. The ability to identify elements under heterogeneous selection enables, for the first time, the genome-wide investigation of positive selection on functional elements other than protein-coding genes.
منابع مشابه
Differences in genome-wide repeat sequence instability conferred by proofreading and mismatch repair defects
Mutation rates are used to calibrate molecular clocks and to link genetic variants with human disease. However, mutation rates are not uniform across each eukaryotic genome. Rates for insertion/deletion (indel) mutations have been found to vary widely when examined in vitro and at specific loci in vivo. Here, we report the genome-wide rates of formation and repair of indels made during replicat...
متن کاملSmall insertions are more deleterious than small deletions in human genomes.
Although lines of evidence suggest that small insertions and deletions differ in their mechanisms of formation, there remains the debate on whether natural selection acts differently on the two indel types. Currently available personal genomes and the 1000 Genomes Project permit population level and genome scale comparison of the selection regimes on the two indel types. We first developed a st...
متن کاملGenome-wide insertion–deletion (InDel) marker discovery and genotyping for genomics-assisted breeding applications in chickpea
We developed 21,499 genome-wide insertion-deletion (InDel) markers (2- to 54-bp in silico fragment length polymorphism) by comparing the genomic sequences of four (desi, kabuli and wild C. reticulatum) chickpea [Cicer arietinum (L.)] accessions. InDel markers showing 2- to 6-bp fragment length polymorphism among accessions were abundant (76.8%) in the chickpea genome. The physically mapped 7,64...
متن کاملINDELSCAN: a web server for comparative identification of species-specific and non-species-specific insertion/deletion events
Insertion and deletion (indel) events usually have dramatic effects on genome structure and gene function. Species-specific indels have been demonstrated to be associated with species-unique traits. Currently, indel identifications mainly rely on pair-wise sequence alignments (the 'pair-wise indels'), which suffer lack of discrimination of species specificity and insertion versus deletion. Also...
متن کاملDevelopment of a RAD-Seq Based DNA Polymorphism Identification Software, AgroMarker Finder, and Its Application in Rice Marker-Assisted Breeding
Rapid and accurate genome-wide marker detection is essential to the marker-assisted breeding and functional genomics studies. In this work, we developed an integrated software, AgroMarker Finder (AMF: http://erp.novelbio.com/AMF), for providing graphical user interface (GUI) to facilitate the recently developed restriction-site associated DNA (RAD) sequencing data analysis in rice. By applicati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PLoS Computational Biology
دوره 2 شماره
صفحات -
تاریخ انتشار 2006